Kernel density estimation of CSD distributions - an application to knowledge based molecular optimisation

نویسندگان

  • Patrick McCabe
  • Oliver Korb
  • Jason C. Cole
  • Robin Taylor
چکیده

The Cambridge Structural Database ( CSD ) contains a large amount of molecular structure data ( bond length, bong angle and torsion angle data.) Much of this data has previously been extracted in histogram form and provided in the Mogul program. Histograms however have several disadvantages e.g. they are not smooth, they depend on bin widths and bin end points. Kernel density estimators do not bin data and have no end points but centre a kernel function at each data point and smooth kernel functions will generate smooth density estimates [1]. A difficulty of the approach though is how wide to make the kernel functions. In this work kernel density estimation is used to generate probability density functions ( pdfs ) for bond length, bond angle and torsion angle histograms derived from the CSD. Gaussian kernels are used for bond length and bond angle data and a von Mises kernel is used for the torsion angle data [2]. The resulting pdfs are smooth and are suitable for application to molecular geometry optimisation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of the Gamma kernel and the orthogonal series methods of density estimation

The standard kernel density estimator suffers from a boundary bias issue for probability density function of distributions on the positive real line. The Gamma kernel estimators and orthogonal series estimators are two alternatives which are free of boundary bias. In this paper, a simulation study is conducted to compare small-sample performance of the Gamma kernel estimators and the orthog...

متن کامل

Moment Inequalities for Supremum of Empirical Processes of‎ ‎U-Statistic Structure and Application to Density Estimation

We derive moment inequalities for the supremum of empirical processes of U-Statistic structure and give application to kernel type density  estimation ‎and estimation of the distribution function for functions of observations.  

متن کامل

Identification of Hazardous Situations using Kernel Density Estimation Method Based on Time to Collision, Case study: Left-turn on Unsignalized Intersection

The first step in improving traffic safety is identifying hazardous situations. Based on traffic accidents’ data, identifying hazardous situations in roads and the network is possible. However, in small areas such as intersections, especially in maneuvers resolution, identifying hazardous situations is impossible using accident’s data. In this paper, time-to-collision (TTC) as a traffic conflic...

متن کامل

Nonparametric estimation of the coefficient of overlapping - theory and empirical application

The coefficient of overlapping OVL measures the amount of agreement of two probability distributions. Statistical inference for OVL has been mainly investigated in a parametric framework. Five strongly consistent nonparametric estimators for OVL based on kernel density estimation are suggested. A Monte-Carlo simulation investigates bias and standard deviation of the estimators in finite samples...

متن کامل

تشخیص سرطان پستان با استفاده از برآورد ناپارمتری چگالی احتمال مبتنی بر روش‌‌های هسته‌ای

Introduction: Breast cancer is the most common cancer in women. An accurate and reliable system for early diagnosis of benign or malignant tumors seems necessary. We can design new methods using the results of FNA and data mining and machine learning techniques for early diagnosis of breast cancer which able to detection of breast cancer with high accuracy. Materials and Methods: In this study,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2014